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Abstract 



This work proposes a complete algebraic model for classical informa- 
tion theory. As a precursor the essential probabilistic concepts have been 
defined and analyzed in the algebraic setting. Examples from probabil- 
ity and information theory demonstrate that in addition to theoretical 
insights provided by the algebraic model one obtains new computational 
and anlytical tools. Several important theorems of classical probability 
and information theory are formulated and proved in the algebraic frame- 
work. 

1 Introduction 

The present paper proposes an algebraic model of classical information the- 
ory. We then carry out a detailed investigation of the model. The connec- 
tion between operator algebras and information theory — both classical and 
quantum — have appeared in the scientific literature since the beginnings of in- 
formation theory and operator algebras — both classical and quantum (see e.g. 



|Ume62l |Seg60| ILin741 IAra75l [Key02 , BKK07, KW06 ) . The standard formula- 



tion of classical information theory [Ash90l |CT99J on the other hand is some- 

times seen as an important application of probability theory. Thus probabilistic 
concepts like distribution function, conditional expectation and independence 
are vital for the development of information theory. Most previous work in- 
cluding those mentioned above focus on some aspects of information theory, 
especially the noncommutative generalizations of the concepts of entropy and 
for specific probabilistic concepts they often resort to a representation on some 
Hilbert space. As a consequence, there does not appear to be a unified coherent 
approach based on intrinsically algebraic notions. The construction of such a 
model is one of the goals of the paper. As probabilistic concepts play such an 
important role in the development of information theory we devote a fairly large 
section to an algebraic approach to probability. It was I. E. Segal |Seg54| , one of 
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the major players in the early development of operator theory who first proposed 
such an algebraic approach to probability theory. Although we have mostly re- 
stricted ourselves to the discrete case, sufficient for our models of communication 
and information processes, our proposed model is different from Segal's. We be- 
lieve several aspects of our approach are novel (see the section-wise synopsis 
below) and yield deeper insights to information processes. 

A strong motivation for this paper is the relatively young field of quantum 
information theory. It is almost folklore that in quantum mechanics we are 
forced to deal with noncommutative entities. Thus, the language of C* alge- 
bras, already known to physicists for decades [Haa92 Emc84 as "the algebra 
of observables" on which many extensions of classical probabilistic concepts can 
be made, became a natural setting for quantum information. As a complex 
quantum information scheme or protocol has several classical components (e.g. 
classical communication, coin-tosses etc.) it is important that we have a unified 
model and a single language for quantum and classical information. Such a for- 
mulation will be of great help in the difficult task of protocol analysis. Besides 
a unified framework will be of significant advantage for theoretical analysis. For 
example, a deeper study of quantum phenomena like (no) quantum broadcast- 
ing |BBLW07] . quantum Huffman coding [BFGLOO], channel capacity [Sch96 
to name a few would benefit from the investigations of these structures. In this 
framework we may view a classical process as a special type of process described 
by commuting elements. Therefore, it seems appropriate to investigate this spe- 
cial case first. As we will show the classical structure is quite rich and sheds 
new light on some familiar aspects of information theory. There is yet another 
reason. In quantum mechanics we have several examples of observables taking 
only a finite number of values (the spectrum is finite) . But in classical mechanics 
all variables take on a continuum of values. Therefore, we often see statements 
like "a finite-dimensional operator like spin is a purely quantum phenomenon 
that has no classical analogue" . However, when we talk about information sys- 
tems finite-dimensional quantum systems have obvious classical analogues. A 
2-dimcnsional quantum "source" corresponds to a classical binary source. Our 
investigations raise some questions about the possibility of an alternative for- 
mulation of probability theory with a more algebraic flavour |Seg54| . This is 
interesting in itself. But it is a side issue in this paper and will only be briefly 
commented upon. Since our main concern is the mathematical models of infor- 
mation processing systems we will be primarily dealing with discrete systems, 
thus circumventing some tricky topological issues. 

Let us recall a simple model of a communication system proposed by Shanon 
[Sha48l ISW49] . This model has essentially four components: source, channel, 
encoder /decoder and receiver. The source could be representing very different 
kinds of objects: a human speaker, a radar antenna or a distant star. We usually 
have some model of the source. The coding/decoding operation is required for 
three basic reasons: i) the source/receiver alphabet and the channel alphabet 
may be different, ii) to maximize the rate of information communication and ii) 
to detect and correct errors due to noise and distortion. Some amount of noise 
affects every stage of the operation. So the behavior of components are generally 
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modeled as stochastic processes. This is valid in both the classical and often 
quantum communication processes. The difference, of course, is in the descrip- 
tion of the two processes. As in any stochastic process, we specify the source by 
a family X t of random variables and the various stages of the communication 
system are modeled as (stochastic) transformations of these variables. The pa- 
rameter t can be continuous or discrete. In this work our primary focus will be 
on discrete processes corresponding to discrete time. Thus, a discrete source can 
be viewed as a generator of a countable set of random variables. Let us suppose 
the source "tosses" a coin and sends a 1 if it is "heads" and a otherwise. We 
may model this by a pair of random variables {X^, X?} on the probability space 
{h (heads), t (tails)} such that X h (h) = X t (t) = 1 and X h (t) = X t (h) = 0. If 
the coin is unbiased we say that the state of the source is given by a probabil- 
ity measure {1/2, 1/2}. In general it is {p, q} where ^ p = 1 — g^lis the 
probability of heads. This simple model can be generalized to more complicated 
sources. Besides these elementary random variables we encounter functions of 
these variables. Thus we are led to study algebras of random variables. Recall 
the standard definition of a random variable: it is a (measurable) function on a 
probability space S. Hence, in the standard formulation we need a probability 
space or sample space to define our random variables or "observables" . Let us 
also recall that a probability space is a triple (S,M.,fi), where S is a set, the 
set of elementary or atomic events, M. is a er-algebra of subsets and /i is the 
probability measure. Thus if {A n } is a sequence of mutually disjoint elements 
from A4 then 

00 

n n—1 

Moreover, [i(S) = 1 and n(B) ^ for any B e M.. These are essentially the 
Kolmogorov axioms. A real or complex valued random variable is a measurable 
function form S — > R or S — > C. Here measurability is with respect to the 
Borel cr-algebra of M. or C. We recall that the Borel u-algebra of any topological 
space is generated by its open sets. So in some sense in this formulation the 
probability space is fundamental and the notion of random variables is based 
on the former. However, from an observer /experimenter point of view the ran- 
dom variables are the basic entities because these are precisely the observables. 
In statistical theories like information theory it is the set of random variables 
and their distributions and transformations which are of primary interest. Of 
course, to compute the probability distributions of the random variables we 
have to appeal to the original probability space. But once the distributions 
have been determined for almost all computations they suffice and the under- 
lying probability or sample space plays little role. The fundamental theorem of 
Kolmogorov [Bil95 , Shi84 guarantees that given a set of random variables and 
their distributions satisfying certain consistency conditions we can reconstruct 
a probability space giving these distributions. These observations suggest that 
we take the algebra of random variables or observables as our primary structure 
and derive all relevant quantities from this structure. One of the advantages 
is that we deal with a smaller spaces restricted to quantities of interest. In 
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the modeling of security protocols this a more realistic approach since different 
participants have access to different sets of observables and may assign different 
probability structures on the same set of events. 

In the quantum case there are more fundamental reasons for working with 
the algebras of observables. We will not go into these here. The current work is 
an attempt at formulating (classical) information theory in an algebraic frame- 
work. We will mainly focus on C* and von Neumann algebras. We will see that 
most interesting spaces of observables do have a C* structure. As mentioned 
before, we will be dealing with discrete spaces in this work. We also observe 
that C* algebras have been studied intensively since the pioneering works of 
Murray, von Neumann, Gelfand, Naimark and Segal and others starting from 
1930s. As we stated at the beginning of this section, several probabilistic and 
information theoretic concepts like conditional expectation, entropy, differential 
entropy have previously been investigated in the algebraic context. However to 
the best of our knowledge there is no work investigating information and com- 
munication theory in a purely algebraic framework. Our investigations indicate 
that most if not all important concepts and constructs of information theory can 
be dealt with in the algebraic framework. The paper is structured as follows. 

In Section [2] we give the basic definitions of the algebras of interest. This 
section is fairly detailed as we provide proofs of several structure theorems 
for finite-dimensional abelian C* algebras and their tensor products, possibly 
infinite. There are two reasons for this. The first is to make the paper as 
self-contained as possible. The second reason is to demonstrate the power and 
utility of the algebraic techniques. Moreover, we believe that in these special 
cases some of the proofs are new. We also give several examples. 

Section [3] gives an account of probabilistic concepts from an algebraic per- 
spective. In particular, we investigate the fundamental notion of independence 
and demonstrate how it relates to the algebraic structure. We note that there 
is a very sophisticated theory of noncommutative or "free probability" |VDN92] . 
Our approach in the simpler commutative case is different in several aspects. 
One important point in which our approach seems novel is the definition of 
a probability distribution function. The definition we give is algebraic in the 
sense that it depends on the intrinsic properties of the algebra. Specifically, we 
define a probability distribution function as the weak limit of a net or sequence 
of elements in a subalgebra representing an approximate identity of an ideal or 
a subalgebra. To illustrate the practical use of these techniques we give some 
typical examples from standard probability theory. The problem of "waiting 
time" shows that the algebraic approach can offer new techniques and insights. 
Finally, using the definition of distribution function and some other constructs 
we formulate and prove some of the basic limit theorems in this framework. 
These are used later in proving results in information theory. 

In Section [4] we give a precise algebraic model of information communication 
system. The fundamental concept of entropy is introduced as a limiting value of 
typical sequences of the algebra. The notion of typical sequence comes from the 
limit theorems. In the conventional approach the limit is taken in the probability 
(convergence in measure). In our algebraic case it corresponds to a weak limit. 
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The point is, we can do all this in purely algebraic setting. We also define and 
study the crucial notion of a channel. In particular, the channel coding theorem 
is presented as an approximation result. Stated informally, 

Every channel other than the useless ones can be approximated by a lossless 
channel under appropriate coding. 

In the final section we summarize our constructions and discuss future work. 

2 Algebraic Preliminaries 

An algebra A is vector space over a field F with an associative bilinear product: 
A x A — > A. We take F = C, the field of complex numbers. We deal mostly 
with unital algebras, that is, algebras with a unit 1. A Banach algebra is an 
algebra with a non-negative real function || || on A such that 

||x|| and ||a|| = 0, iff a = 
fa; + y\ ^ ||x|| + ||y|| (triangle inequality) 
\ x v\ < \ A\v\ (Banach property) 

and A is complete in the topology defined by the norm. A C* algebra B is 
a Banach algebra with an anti-linear involution * (a map a is an involution if 
a 2 = 1, it is antilincar if er(x + cy) = cr(x) + co-{y), c a complex number) such 
that 

||xx*|| = ||x|| 2 and (xy)* = y*x*\/x,ye B 

This implies that ||x|| = ||x*||. The quintessential examples of a C* algebra are 
the norm-closed subalgebras of C{H), the set of bounded operators on a Hilbert 
space of H. The fundamental Gelfand-Naimark-Segal (GNS) theorem states 
that every C* algebra can be isometrically embedded in some C(H). The notion 
of the spectrum of an operator has an algebraic analogue without reference to 
the representation space. The resolvent of an element x in the C* algebra B is 
the set R(x) a C such that A e R(x) implies Al — x is invcrtible. The spectrum 
sp(x) is the complement of the resolvent. The spectrum is a nonempty closed 
and bounded subset and hence compact. Define r(x) = sup{|A| : A e sp(x)}, 
the spectral radius. A basic result states that 

r(x) = lim Ix"! 1 /™. 

n— >oo 

An clement x is self-adjoint if x = x*, normal if x*x = xx* and positive (strictly 
positive) if x is self-adjoint and sp(x) <z [0, oo)((0, oo)). A self-adjoint element 
has a real spectrum and conversely. Since x = (x + x*)/2 + i(x — x*)/2i any 
element of a C* algebra can be decomposed into self-adjoint "real" ((x + x*)/2) 
and "imaginary" ((x — x*)/2i) parts. For a self-adjoint element x, r(x) = ||x||. 
Thus a positive element is self-adjoint. The positive elements define a partial 
order on A. Thus x ^ y iff y — x^O (positive). An important property of 
positive elements is that they have unique square-roots. Thus if a ^ there 
is a unique element b ^ such that b 2 = a. We write ^fa or a 1 / 2 for the 
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square-root. Since x*x ^ it has a unique square-root. If x is normal we write 
\x\ = \Jx*x. In particular, if x is self- adjoint, \x\ = yfa?. A self-adjoint element 
x has a decomposition x = x + — X- into positive and negative parts where 
x + = (|x| + x)/2 and x_ = ()a;[ — x)/2) are positive. An element p e B is a 
projection if p is self-adjoint and p 2 = p. Given two C*-algebras A and B a 
homomorphism F is a linear map preserving the product and * structures. It is 
continuous iff bounded. A continuous isomorphism of C* algebras is an isometry 
(norm preserving) . A homomorphism is positive if it maps positive elements to 
positive elements. A (linear) functional on A is a linear map A — > C. The 
GNS construction starts with a positive functional (mapping positive elements 
to non-negative numbers) on B. The details may be found in (KR97, Tak02j. A 
positive functional u> such that u>(l) = 1 is called a state. The set of states G is 
convex. The extreme points are called pure states and G is the convex closure 
of pure states (Krein-Millman theorem) . A set B a A is called a subalgebra if 
it is a C* algebra with the inherited product. That is, it is a subalgebra in the 
algebraic sense and it is closed in the norm topology. A subalgebra is B called 
unital if it contains the identity of A. Our primary interest will be on abelian 
(also called commutative) algebras. The structure theory is a bit different in this 
case. Of course, the GNS construction is valid and the elements of the algebra 
act as multiplication operators on the representing Hilbert space. However, 
there is an alternative representation in the abelian case due to Gelfand and 
Naimark which will be of primary interest to us. To motivate it consider an 
example. 

Let A be a compact Hausdorff topological space, for example, a closed and 
bounded set in E™. Let C(X) denote the space continuous complex functions on 
X. It includes the constant functions. If we define addition and multiplication 
point-wise 

(/ + 9)(x) = f(x) + g(x), (fg)(x) = f{x)g{x) and 

||/|| = sup !/(*)! V/, 5 eC(X) C 1 ) 

xeX 

then C(X) becomes a complex Banach algebra. If we define f*(x) = f(x) then 
C(X) is an abelian C* algebra. This is a prototype of abelian C* algebras 
[KR97J. One can generalize to (essentially) bounded measurable functions on 
measure spaces with appropriate norm. However, for the purposes of this paper 
it suffices to consider compact spaces with measures defined on Borel cr-algebras. 
We will dwell more on this point in the next section. A complex function (not 
necessarily continuous) is called simple if its range is finite. For example, the 
indicator function Is of a subset S c X, given by Is(x) = 1 if x e S and 
otherwise is a simple function. It is not continuous unless S = X or S is a 
connected component. Simple functions play a crucial role in probability and 
integration theory. From their definition it follows that the projections in C(X) 
are precisely the indicator functions. The constant functions 1 and are both 
projections corresponding to S = X and resp. These are the only projections 
in C(X) if X is connected. The basic structure theorem for abelian C* algebras 
is the following. 
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Theorem 1. An abelian C* algebra with unity is isomorphic to the algebra 
C(X) for a compact Hasudorff space X. The isomorphism is an isometry (norm 
preserving). 

The main idea of the proof comes from the following observation. In any 
function algebra C(X) for p e X the map op : / — > f(p), f e C(X) is a 
linear functional on C(X). These are multiplicative functionals in the sense 
that o p (xy) = a p (x)a p (y). In fact these arc the only possible multiplicative 
functionals. The Gclfand representation for an abstract abelian C* algebra A 
identifies the space X as the set of multiplicative functionals and gives it a 
topology to make these continuous. The details can be found in [KR97 . 

Now let X = {oi, . . . , a n ] be a finite set with discreet topology. Then A = 
C(X) is the set of all functions X — > C. The algebra C(X) can be considered 
as the algebra of (complex) random variables on the finite probability space X. 
Let Xi(a,j) = Sij, i,j = 1, . . . ,n. Here Sij = 1 if i = j and otherwise. The 
functions Xj e A form a basis for A. Their multiplication table is particularly 
simple: They also satisfy Y>i x i = 1- These arc projections 

in A. They are orthogonal in the sense that XiXi = for i j. We call 
any basis consisting of elements of norm 1 with distinct elements orthogonal 
atomic. A set of linearly independent elements {yi} satisfying YiiVi = 1 ls 
said to be complete. The next theorem gives us the general structure of any 
finite-dimensional algebra. 

Theorem 2. Let A be a finite- dimensional abelian C* algebra. Then there is a 
unique (up to permutations) complete atomic basis B = {x\, . . . , x n }. That is, 
the basis elements satisfy 

x* = Xi, XiXj = SijXi, \\xi\\ = 1 and = 1, (2) 

i 

Let x = Yji a i x i e A. Then sp(x) = {a.;} and hence \x\ = maxi{|o,i|}. 

Proof. Let {j/i, . . . ,y n } be a basis for A. Since the self-adjoint elements (yi + 
y*)/2 and i(yi — y*)/2 span A we can choose an independent set. Hence, we may 
assume that the yi are self-adjoint. Then each yf is positive and hence possesses 
a square-root Moreover, ^ |KR97I lBra02j j^| We can therefore write 
each yi = (|i/j| + t/j)/2 — (|yj| — yi)/2, as the difference of two positive elements. 
Again, choosing an independent set we may assume that yi themselves are 
positive with norm 1. Let S = {z : z ^ and \z\ < 4. S is convex and compact 
(being closed and bounded) and j/j 6 S. Hence, by the Krcin-Millman theorem 
[KR97J S is the convex closure of its extreme points^ We may assume that 
these extreme points have norm 1 (obviously discarding 0). Since each yi can 
be written as a finite convex sum of its extreme points we can pick a basis 
Xi, . . . ,x n of extreme points. We complete the proof by showing that the Xj's 
satisfy equations ^ and that they are unique. 

1 Bratteli (pp. 35) gives a proof which does not use Gelfand representation. 
2 Recall that extreme points of a convex set are those which cannot be written as a non- 
trivial convex combination of some members of the set 
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Now ||xj|| = 1 implies that for any |A| > 1, A — x^ = A(l — \~ 1 xi) is invertible. 
This can be proved by using the geometric series of (1 — A Xj) -1 . Hence if 
a e sp(xj) then ^ a ^ 1 and 1 — Xi is positive. Since sp(x^ — xf) = {a — a 2 : a e 
sp(xj)} and a — a 2 ^ it follows that x l — x 2 ^ 0. As Xi = (2(Xj — xf) + 2xf )/2, 
a convex combination of two positive elements in S and Xj is a non-zero extreme 
point we must have Xi — xf = or Xi — xf = xf . The last possibility is ruled 
out because it would imply ||x,-|] = 2\\xf\\ = 2||xi|| 2 = 2. Hence xf = x,. To prove 
that they are orthogonal observe that X, — XiXj = Xj(l — Xj) is positive. Thus 
Xi = (2x,-(l — Xj) + 2XjXj)/2 is a convex combination of points in S. Hence, as 
before either XjXj = or x, = XjXj. With Xj in place of x$ we conclude that 
XjXj = or Xj = XiXj. Thus the only possibility for Xj ^ Xj is that XjXj = 0. 

To prove the decomposition property let 1 = Y^i a % x i- Squaring and using 
the orthogonality of x^'s we conclude that m = 1 or 0. If some a,k = then 
Xk = Xfcl = Xk 2j = 0- Hence, all = 1. Finally, let {z^} be another basis 
satisfying ([2l. Let = J] . byXj. As before, 6^ = 1 or and the matrix (by) 
is a 0-1 matrix. For fixed i let Tj be the set of integers j such that 6^ = 1. 
Then XiXj = 0, i ¥= j implies Ti and are disjoint. This along with the last 
condition in ([2| implies that TVs form a partition of the set {1, . . . ,n}. Thus 
each Ti is a singleton and the matrix (fey) has exactly one 1 in each row and 
column. It is a permutation matrix. 

Let x = a^Xi be an element of A. Then Al — x = — a i) x i- This 

is invertible iff A ^ Oj, i = 1, . . . ,n with inverse — a i) lx i- The proof is 

complete. □ 

Let us observe that we could have proved the theorem using the Gelfand 
representation. But the above proof is more intrinsic depending mostly on the 
structure of the algebra itself only. 

Corollary 1. Let A be an abelian C* -algebra satisfying the following conditions. 
There are finite- dimensional subalgebras A^, k = 0, 1, . . . with 



and for each k corresponding to Ak there is complementary subalgebra A' k C 
Ak+\ such that AkA' k = Ak+i, Ak f] A' k = {0, 1} and for x e Ak,y s A' k implies 
xy ^ unless x or y is 0. Then there is a countable basis for A satisfying the 
first three equations in p?p. 

Proof. We prove by induction. The case of A is proved in the theorem. Assume 
we have an atomic basis {y™, . . . , y k } for A n . There is a (unique) atomic basis 
{x™ ...,x^ n } in A' n . It is now a routine matter to show that {x"y™ : 1 ^ 
k n and 1 ^ m n } form a basis in A n+ \. □ 

The conditions in the corollary can be slightly weakened by requiring that 
there be embeddings (injective algebra homomorphisms) : Ak — »■ Ak+\ and 
a'k ■ A' k — > Ak+i such that the images ak(Ak) and a' k (A' k ) satisfy the conditions 
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stated. Such a structure will appear in the tensor product of algebras to be 
defined below. They play an important role in our modeling of information and 
communication systems. Let us also note that the basis structure in Theorem 
[2] may be used to defined a finite dimensional C* algebra abstractly. 

2.1 Tensor products 

We next describe an important construction for C* algebras. Given two C* 
algebras A and B, the tensor product A ® B is defined as follows. As a set 
it consists of all finite linear combinations of symbols of the form {x ® y : x e 
A, y e B} subject to the conditions that for all x, u e A, y,z e B and c e C, 

(cx) ®y = x® (cy) = c(x®y) 
(x + u) ®y = x®u + u®y and x®(y + z)=x®y + x®z. 

Thus the tensor product is bilinear. There are no other relations. Note that by 
definition the products of the form x ®y span A® B. Hence, if {xi} and {yj} 
are bases for A and B respectively then {xi ® yj} is a basis for A® B. The 
linear space A®B becomes an algebra by defining (x®y)(u® z) = xu®yz and 
extending by bilinearity. Explicitly, 

2 di(xi ®y t )^]b J (u j ® Zj) = ^ aibj(xiUj ® yiZj) 

i j ij 

The * is defined by (x®y)* = x* ®y* and extending anti-linearly. The problem 
is defining the norm since it is not a linear function. In fact, for general C* 
algebras there could be a number of inequivalent norms on different completions 
of A® B. This problem of non-uniqueness, however, does not exist if one of the 
factors is abelian or finite-dimensional. Since, in this work we will be primarily 
concerned with abelian algebras this point will not be discussed further. Our 
basic model will be an infinite tensor product of finite dimensional C* algebras 
which we present next. 

Let Ak, k = 1, 2, . . . , be finite dimensional abelian C* algebras with atomic 
basis Bk = {xti, ■ ■ ■ , Xkn k }- Let B VJ be the set consisting of all infinite strings 
of the form z ix ® z i2 ® ■ ■ ■ where all but a finite number (> 0) of z ik s are equal 
to 1 and if some Zi k 1 then zt k e Bk ■ Explicitly, B rii consists of strings of the 
form z il ®z i2 ® ■ ■ -®z ik ® 1 ® 1® ■ ■ ■ , k = 1, 2, . . . and Zi e B. Let 21 = ®^L 1 A i 
be the vector space with basis B rXj such that Zi ± ® Zi 2 ® ■ ■ ■ ® Zi k ® ■ ■ ■ is linear 
in each factor separately: 

Z\ x ® ■ ■ ■ ® (az ik + bz' lk ) ® z ik+1 ®--- =a(z ll ® ■ ■ ■ ® z lk ® z lk+1 ® ■ ■ ■ ) + 

b(zi 1 ® ■ ■ • ® z' ik ®z lk+1 ®---). 

Clearly every a 6 21 is a finite linear combination of elements in B cfJ . We 
define a product in 21 as follows. First, for elements of B rX) : 

(z h ®z l2 ®-- • ){z' h ® z' i2 ® • • • ) = {z ix z' ix ® z l2 z[ 2 ® ■ ■ ■ ) 
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We extend the product to whole of 21 by linearity. Next define a norm by 

I 2 a ^l- Z il ® Z i2 ® ■ ■ ■ II = su P{| a iii 2 -l} 
il,i 2 ,... 

It is straightforward to show that i? 00 is an atomic basis. It follows that the 
above function is indeed an algebra norm and that 21 is an abelian normed 
algebra. We also define *-operation by 




It is routine to check that for x e 21, \xx*\ = \\x\\ 2 . Finally, we complete the 
norm and call the resulting C* algebra 21. The completion of a norm is a tech- 
nical device that uses the fact that any normed algebra X can be isometrically 
mapped to a norm complete algebra (a Banach algebra) X and the image X 
is dense in X (see |KR97j ) j^] With these definitions 21 is a C* algebra. An 
important special case is when all the factor algebras Ai = A. We then write 
the infinite tensor product C* algebra as (X) x A. Intuitively, the elements of an 
atomic basis B rj ~ of (X) A correspond to strings from an alphabet (represented 
by the basis B) with a given prefix. A general element of A which is a linear 
combination of elements of (X) " B. Of particular interest is the 2-dimensional 
algebra D corresponding to a binary alphabet. Thus we name (X) " D the binary 
algebra. Let us fix some notation. For any finite dimensional C* algebra A the 
atomic basis B rJi for (X) J A constructed above will be denoted by B^ to em- 
phasize the association. The algebras (X) " A will be our model of signals from a 
source/encoder which are strings (of arbitrary length) from some alphabet. We 
next prove a result that is relevant for coding theory. 

Proposition 1. Let A be an abelian C* algebra of dimension n with atomic 
basis Ba = {xq, . . . , x n -i}. Let Bq = {yo,yi} be the atomic basis of the 2- 
dimensional algebra G defined above. Then there are injective algebra homo- 
morphisms 

00 00 00 00 

J : ®G ^ ®A andj' : ® A ^ ®G 
that are isometries. 

Proof. We observe that it is sufficient to define an injective set map j (resp. j') 
from Bq to B% (resp. B'^ to Bq). For we can first extend these to linear maps 
J (resp. J') on the appropriate spaces. The fact that the bases are atomic will 
ensure that these are injective algebra homomorphisms, in fact, isometries. Let 

j(zi ®---®3k®l®"-) = 4>(zi) ® • • • ® 4>(zk) ® 1 ® • ■ ■ where 
z% e {yo, Vi} and (p(y ) = x , <j>(y{) = x x 

3 There are some delicate convergence issues here. Since 21 consisting of finite sums of tensor 
products is dense in 21 it often suffices to prove some statement about 21 and extended it to 
21 by continuity. 
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To construct f let the binary representation of the integer n — 1 be of length 
k + 1 where k = [log 2 nj. For ^ r ^ n - 1 r = b r + b\2 + b r 2 2 2 + ■■■ + b r k 2 k be 
the binary representation of r of length k + 1 (pad it with O's if necessary). Let 
i/j : Ba — > -Bq be the map defined by 

V'(^r) = yb-®ybi®---® ybi 

extend it to a map f : B r £ —> Bq by 

j'(zi ®---®2 fe ®l®---) = 0(zi) ® • • • ® 0(2 fc ) ® 1 ® • • • 
The map j' is injective and the proof is complete. □ 

Let us note that from the injective maps j and j' we can construct a bijective 
correspondence between B c j± and Bq by a Schroeder-Bcrnstcin type construc- 
tion (see |Kle52| ) and this can be lifted to an algebra isometry. But for us, the 
isomorphisms induced by maps like j and j' (these are certainly not unique) will 
be greatest interest. Essentially, what the proposition says is that it is often 
sufficient to restrict our attention to the special algebra (X) G. 

The next step is to describe the state space. We recall that states of an 
algebra A are precisely the positive functionals u> that are normalized: = 1. 

Given a C* subalgebra Fci the set of states of V will be denoted by S*(V). 
Let 21 = ®^L 1 Ai denote the infinite tensor product of finite-dimensional algebras 
Ai . An infinite product state of 21 is a functional of the form 

f2 = l>i ® u>2 ® • ■ ■ such that uii e y(Aj) 

This is indeed a state of 21 for if ctk = z\ ® z-i ® • • • ® Zk ® 1 ® 1 • • • s 21 then 

Q,(a) = aji(z 1 )u2(z 2 ) ■ ■ ■ Uk(zk), 

a finite product. Since an arbitrary element of 21 is the limit of sequence of finite 
sums of elements of the form ctk, k = 1, 2, . . . is bounded by the principle of 
uniform boundedness. Clearly, it is positive. A general state on 21 is a convex 
combination of product states like f2. 

2.2 Analytic functions on C* algebras 

In this section we discuss another useful construction. Let A be a C* algebra. 
Suppose f(z) is an analytic function whose Taylor series 2n=o a ™( z — c )™ ^ s 
convergent in a region \z — c\ < R. The convergence of the series 2 ]|a; — cl|™ 
for \\x — ct\\ < R implies that the series Yln=o( x ~ d) n converges (we need 
completeness of A for this) . Thus it makes sense to talk of analytic functions on 
a C* algebra. If we have an atomic basis {xi, x 2 , . . . } in an abelian C* algebra 
then the functions are particularly simple in this basis. Thus if x = then 
f(x) = 2i f(cLi)Xi provided that /(dj) are defined in an appropriate domain. We 
will mostly take this as our definition with the understanding that the constant 
function c is identified with cl. 
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3 Algebraic approach to probability 



We have observed that discrete signals from a source are modeled by an abelian 
algebra. The elements of the algebra correspond to random variables repre- 
senting the output of the source. With random variables we always associate 
a probability distribution. In the standard treatment of probability theory the 
probability or sample space is introduced first. Random variables are defined 
as (measurable) real (or complex, in general) functions on this space. One then 
finds the probability distributions of the random variables and most important 
quantities like mean, variance and correlations are based on these distributions. 
In particular, the mean or expectation value plays a central role. Note that 
random variables can be added and multiplied making it a real algebra (scalars 
are the constant random variables). Note also that random variables also rep- 
resent quantities that are actually measured or observed- the voltage across a 
resistor, the currents in an antenna, the position of a Brownian particle and so 
on. The probability distribution corresponds to the state of the devices that 
produce these outputs. We will take the alternative view and start with these 
observables as our basic objects. In this way, we single out the objects which are 
relevant to a specific problem. In the following paragraphs we formalize these 
notions. 

3.1 Basic notions 

A classical observable algebra is an abelian complex C* algebra A. It is con- 
venient to use complex algebras. We can restrict our attention to real algebras 
whenever necessary. Recall that a state on A is positive linear functional uj 
such that lu(1) = 1. We can identify u> with a probability measure as follows. 
Suppose (M,S,P) is probability space, (M— sample space, S — cr-algebra, 
P= probability measure). Let L^{M,S,P) (or simply L rrj (M) if the measure 
structure is clear) be the set of essentially bounded measurable complex func- 
tions]^] We can give it a C* structure as in the case of C(X), the space of 
continuous functions on a compact topological space X (see equation Q), but 
using the essential supremum instead of the ordinary supremum. If B e S then 
the indicator function Ig s U fJ (M, C) and 

r i B dP=p(B) 

JM 

where the integral is defined in the sense of Lebesgue. Note that u)p(f) = J fdP 
is a positive linear functional on L co (M). Since wp(l) = 1 it is a state. 

Definition 1. A probability algebra is a pair (A, S) where A is an observable 
algebra and S c S^(A) is a set of states. A probability algebra is defined to be 
fixed if S contains only one state. A probability algebra si\ = (A±,Si) is defined 

4 A function / is said to be essentially bounded if there is a constant K such that |/(^)| €5 K 
almost everywhere. The essential is the infimum over all such K: ess sup(|/|) = inf{Ai : P\x : 
|/(a:)|>fc} = 0}. 



12 



to be a cover of another gfe = (^2,62) if there is an algebra homomorphism 
<p : A\ — > A2 and a one-to-one correspondence 7 : Si <-> S2 such that the 
following conditions hold: i. (j> is onto and ii. for all x 6 A\ and U) £ S±: 
uj(x) = j(w){<l>{x)). 

Let u> be a state on an abelian C* algebra A. Call two elements x,y e A 
uncorrelated in the state u> if uj(xy) = u>(x)u)(y). Note that this definition 
depends crucially on the state: the same two elements can be correlated in 
some other state a/. Two natural questions are immediate. Are there any 
states for which every pair of elements of A are uncorrelated? Are there a 
pair of elements which are uncorrelated in every state? Two trivial candidates 
for the second question are 1 and 0. Either of them is uncorrelated to every 
clement. We implicitly exclude these two trivial cases. Concerning the second 
question the answer is negative in general. On the first question, a state lu is 
called multiplicative if ui(xy) = ui(x)u>(y) for all x,y e A. Note that the notion 
of positivity defines a partial order on the space of functionals making it an 
ordered vector space [KR 97] . The set of states, J?, is convex in the usual sense 
that for numbers pi ^ 0, 5] i=1 Pi = 1 and states u>i, i = 1, . . . , k the functional 
^iPiUJi is also a state. The extreme points of are called pure states. In the 
case of abelian C* algebras a state is pure if and only of it is multiplicative 
[KR97J. Thus in a pure state any two observables are uncorrelated. This is not 
generally true in the non-abelian quantum case. 

Next we come to the important notion of independence. First, given Sci 
let A(S) denote the subalgebra generated by S (the smallest subalgebra of A 
containing S) . Two subsets Si,S2 c A are defined to be independent if all the 
pairs {(xi,X2) ■ x\ e A(S\),X2 e A(S2)} are uncorrelated. As independence and 
correlation depend on the state we sometimes write w-independent / uncorrelated 
when to emphasize this. Clearly, independence is much stronger condition than 
being uncorrelated. It is easy to construct examples in 3 or more dimensions 
where a pair of observables x, y are uncorrelated but they are not independent: 
for example, x 2 and y maybe correlated. However, in 2 dimensions x and are 
uncorrelated if and only if one of them is or cl. Let us note that as in the 
quantum case two dimensions is an exceptional case. The next theorem shows 
the structural implications of independence. 

Theorem 3. Two sets of observables Si, S2 in a finite dimensional abelian C* 
algebra A are independent in a state lo if and only if for the (unital) subalgebras 
A(Si) and A(S2) generated by S± and S2 respectively there exist states lo\ 6 
y{A{Si)), lo 2 e y(A(S 2 )) such that (A(Si) ® A(S 2 ), {ui (8) o; 2 }) is a cover of 
(A(SiS2),w') where A(SiS2) is the subalgebra generated by {Si,^} and u>' is 
the restriction of w to A(S±S2)- 

Proof. First assume that Si = {x} and S2 = {y}. Let {xi, . . . , x n } be an atomic 
basis of A. Let x = ^ aiXi and y = J] i biXi. Some of these coefficients may be 
and some may be equal. Write 

x = aiPi + a 2 P2 + 1- a k P k and y = biQi + 6 2 <?2 H + hQi 



13 



Here the dj's are distinct the Pi = Xi 1 + Xi 2 + ■ ■ ■ + Xi r corresponding to all 
basis elements whose coefficients are equal to a^. Similarly for Qj's. Note that 
PiPm = Sim and QjQ s = Sj S . By Lagrange interpolation there are polynomials 
fi(X), i = 1, . . . , k and gj, j = 1, . . . , I such that fi(a r ) = S ir and gj(b s ) = S JS . 
Since x, y are w-independent 

uUi(x) gj (y)) = "(PiQi) = w(PiMQj)- (4) 

The subalgebra j4(Si)(A(S 2 )) is generated by the Pi's(Q/s). Clearly {Pi : 
i = l,...,fc} and {Qj : j = 1,...,/} are atomic bases for A(Si) and A(S 2 ) 
respectively. Define states ui\ and w 2 of ^4(Si) and A(S2) resp. by restricting uj 
to these subalgebras. Let <j> : X (x) Y — > A' be the natural map <fi(u ®v) = uv. 
Using equation[4]it is a routine check that (A(Si)®A(S2),{uJi (8)^2}) is a cover 
of (A(S 1 ,S 2 ),u/). 

Now for the general case. Since A(Si) and ^(Sy are subalgebras of A they 
have atomic bases {ui} and {vj} respectively. As in the previous case we have 
polynomials {pi} and {qj} in several variables such that Pi(xx, . . . , x^) = Ui and 
Qj(yi> ■ ■ ■ lUmj) = Vj where Xi e S\ and qi e SV We do not have easy interpo- 
lating polynomial in this case. By repeating the argument of the singleton case 
above we get the appropriate cover and complete the proof. 

The converse is clear from the definition of a cover and the fact that in a 
product state uj\ ® uj2{z\ ® Z2) = ^i^i)^^)- d 

We can even extend it to infinite tensor product by restricting to finite 
segments. The next step is to extend the notion of independence to more than 
two subsets. Let S\,...,Sk ^ A and u> a state of A. Then the subsets are 
defined to be w-independent if for all Xi 6 A(Si), i = 1, . . . , k we have 

lo(xi ■ ■ ■ Xk) = uj(xi) ■ ■ ■ u(xk) 

Here A(Si) is the subalgebra generated by Si. We can then show that for states 
e (A(S l )), the restriction of w to A(Si) the pair (A(Si)®- ■ -®A(Sk),uJi®- ■ -® 
LUk) is a cover of A(Si . . . Sk), to', where uj' is the restriction of ui to A(Si . . . Sf.), 
the algebra generated by Si , . . . , S^. We thus see the relation between inde- 
pendence and (tensor) product states in the classical or commutative theory. 
The non-commutative or quantum case is more delicate and requires careful 
handling. 

3.2 Probability distribution functions 

In this section we investigate another important concept of a (cumulative) dis- 
tribution function (d.f) in the algebraic framework. As the paper's primary 
concern is an alternative formulation of mathematical models of information 
and communication we do not undertake an extensive exploration of the alge- 
braic approach to probability concepts. However, the notion of a distribution 
function underpins large part of probability theory and its applications. One 
of the advantages of using C* or more general Banach algebra is that we have 
both algebraic and analytical methods at our disposal. 
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Given a subalgebra B c A of an abelian C* algebra let S a = {x e A : 
xs = Vs e S 1 } be the annihilator of S 1 . This is an idcaj^] and hence there is 
an approximate identity. An approximate identity in an ideal B is a nei {y\} 
with j% j/a < 1 such that xy x — » a; (also ya^ — * x, Va; e £> if the algebra 
is nonabelian). For the details see [KR97] . Obviously S a cannot contain the 
identity of the original algebra unless S = {0}. We only mention that nets 
|Kel75j are generalization of sequences where the indexing set is not required 
to be countable. However, in the case of separable algebras (algebras with a 
dense countable set) the reader may substitute "sequence" for "net" . In the 
following it will suffice for our purpose to restrict to the separable case although 
we often use the language of "nets" . We can now define distribution of a set of 
observables. 

Definition 2. Let S = {x±, X2, ■ • ■ , x n } be a finite self-adjoint subset of A where 
(A,u>) is a fixed probability algebra. For t = (ii, t%, . . . , t n ) 6 R let S t c A 
denote the set of elements {(Ul — xt) : i = 1, . . . , n} and the set of elements 
{z_ : z 6 S t }, negative parts of members of S t . Let {e,\(t)} be approximations of 
identity in the annihilator ideal (S^) a . Then the to -distribution of S is defined 
to be the real function 

F s (t) = limw(ex) 

The rationale for this definition is simple. For convenience, restrict to a 
single random variable. Suppose X is a bounded random variable on a proba- 
bility space P). Then the distribution function /(f) = P({a E Q : X t = 
tl — X(a) ^ 0}). For a fixed t write the random variable X t = X t + — X t - as 
the difference of two non-negative random variables. Then the distribution func- 
tion of X is the probability of the event E t where E t = {a 6 : X t (a) ^ 0}. 
Consider now X t - and G t = {a : X t (a) < 0} = CI — E t . Then X t - is > 
on Gt and outside it. If Y is any function on such that YX t - = then 
Y must vanish on Gt- Conversely any function Y that vanishes on Gt satis- 
fies the equation YX t - = 0. In particular the indicator function JPp t satis- 
fies it. The function J'p t is the identity on (X t J) a and its expectation value 
j J^i? t dP = P{F t ). Although, the indicator functions are not generally contin- 
uous we can approximate them by a sequence of continuous functions. This 
sequence is an approximate identity in the C* algebra of continuous functions. 
In most cases of interest to us the algebras will be separable. Then the nets 
can be replaced by sequences. Note that since the net {e\} is bounded and in- 
creasing the net {uj(e\)} converge. Finally, let us observe that even though the 
approximate identity is not unique the distribution function as defined above 
is unique. To prove this {e\}, {fx} are two approximate identities. Then using 
the fact w(e A / M - e A ///i') = ^(f^ex - ey) + ex'(f^ - jy)) is Cauchy since 
f^ex - ey) -> (ex - ey) and e A '(/ M - f^-) -> f^ - we conclude that the 
double-net {uj(exffj.)} converges to the limit lim A tj(e A ) = lim^u;(/ /i ). Extend- 
ing the definition of the d.f to an arbitrary element z in the algebra is simple. 

5 An ideal of a algebra A is a subset / of A which is closed under addition and for every 
x 6 A, xl £~ I. Hence a non-zero proper ideal cannot contain the identity of A 
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Write z = x + iy where x and y are self-adjoint. Let F x (t) and F y (t) denote the 
d.f of x and y respectively. Then the d.f of z: F z (t) = F x (t) + iF y (t). 

Theorem 4. Let x±, . . . ,x n be self-adjoint elements of an abelian C* algebra 
A. Let F(t\, . . . , £„) be their joint distribution function. Then F(tx, . . . , t n ) is 
non-negative, left- continuous and non- decreasing in each variable. We also have 
boundary conditions 

lim F(ti, . . . , t n ) = 1 and lim F{ti, ■ ■ ■ , t n ) = 



If the elements are independent and F(tA denotes the distribution function of 
Xi then 

F(t u ...,t n ) = F(ti)F(t 2 )"-i ? (tn). 
If a sequence x n —> x in then the corresponding d.f's F x (t) —> F x (t). 

Proof. This is of course a standard result in probability theory. We sketch an 
algebraic proof in the current setting. The most direct approach is to use the 
notion of continuous function calculus which essentially asserts that continuous 
functions on the spectrum can be lifted to define functions on the algebra. More 
precisely, given an clement x e A there is an isometric algebra homomorphism 
between the algebra of continuous functions on the spectrum of x, C(sp(x)) and 
the closed subalgcbra C(x) generated by x [KR97 . Thus for every function 
f(u) on sp(x) there is a unique clement f(x) in C(x) such that if f(u) >, then 
f(x) ^ 0. Since for any real c and S > , \t+6— u\ — (t+S— u) < \t—u\ — (t—u) we 
infer that \t+6— x\ — (t+S— x) < \t—x\ — (t—x) for self-adjoint x e A. Now for any 
y e A if xy = then \x\y = and hence x + y = x_y = 0. So if x < z and v e A 
then zv = implies xv = 0. Thus the annihilator ideal of |£ + <$ — x\ — (t+S — x) 
contains the annihilator ideal of \t — x\ — (t — x). The continuity follows from 
the following construction which is useful for calculating distributions. Write 
x(t) = tl -x, t = (t 1 ,t 2 , . . . ,t n ) and x(t) = x 1 (t 1 ) + x x 2 (t 2 )+ x • • • x x n (t n ) + . 
For integer m > let 

e m (t + l/m) = m X (t + l/m)(l + m X (t + 1/m))- 1 s ™ x{t ± jffi (5) 

where t + l/m = (ii + l/m, £2 + l/m, . . . ,£„ + 1/to). Although e m (t + l/m) is no£ 
a member of the annihilating ideal S~(t) a of S~(t) it belongs to S~(t + l/m) a z> 
S' _ (t) a . Let eA(t) be an approximate identity in S~(t) a . One can show using 
the Gelfand representation that 

lima-^e^t)) = lim u)(e m (t)) 

A m— >oo 

We omit the details but the reader can convince herself by taking an algebra of 
functions. 

This implies the first part of the theorem. To prove the boundary conditions 
we use the fact that the spectrum of any element x e A is bounded by 
Hence, for t < — \\x\\, tl — x has a strictly negative spectrum. Then (£1 — x)— = 
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— (tl—x) is invertible and its annihilator ideal consists of alone. Consequently, 
F(t, ...,) = for all t < —\\x\\. The other extreme case is proved similarly, tl—x 
being strictly positive for t > \\x\\. Finally, suppose the elements {x\, X2 ■ ■ ■ , x n } 
are independent. Since x + lies in the closed subalgebra generated by x the 
definition of independence and equation [5] implies that the joint distribution 
function is a product. One proves the last statement using a sequence like 
([5}. □ 

We see that, starting from a purely algebraic definition of independence and 
distributions we can recover their essential properties. In particular, for algebras 
which are finite or infinite tensor product of finite-dimensional algebras we have 
the following. 

Proposition 2. Let A be a finite- dimensional abelian C* algebra. Let x e ® VJ A 
and x a its annihilating ideal. Suppose x is a finite sum. Then there is a unique 
(up to permutation) decomposition 

x = ^ aiPi such that PiPj = o~ijPj and distinct 

Further, there exist polynomials without constant term gi such that Pi = gi(x). 
Thus, x = 2i &i9i{x)- Then x a has an identity 1 — J]j Pi- 

Proof. Since x is finite sum it may be considered as an element of ® n A for 
some finite n. The space ® n A has a finite atomic basis, say, {Yi, . . . , Y m } (m = 
2 dim ( A )). Let x = T^ =1 a l Y l and let J = {i : a t = 0}. Then x = ^j^Y,. 
Let Pi be the sum of all Yi for which the coefficients a, are equal, then x = 
J]. a^Pi with ai distinct and non-zero. Next use Lagrange interpolation to obtain 
polynomials gi such that gi(0) = and gi(cij) = Sij. To prove uniqueness let 
x = 2j bjQj be another such decomposition. Then xPiQj = aiPiQj = bjPiQj. 
Since 2]j P% x = x f° r a fixed i there must be at least one ji with PiQj i ¥= then 
ai = bj i . There cannot be more than one such ji since the bj's are distinct. 
Arguing in the reverse direction we conclude that i <-» ji is a permutation. The 
last statement follows trivially. □ 

Let x = 2j a iP% be as in the proposition. We call this the spectral decom- 
position of x. If uj is a state define 

S u {x)=2 l w{PdP i 

i 

The map JP^ix) can be considered as a "centroid" of the possible outcomes 
of measurement of x. We can extend the proposition to arbitrary element in 
stf = (x)°°A by using a sequence of finite-dimensional projections as above to 
approximate. However, the proposition suffices for most of our requirements. 
Now let 

00 

Z= 2 *k, X k z® k A 
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Z may not be a member of srf in general as we treat the above as a formal 
sum. However, we suppose that for real t, (tl — Z) + = (\tl — Z\ + (tl — Z))/2 
can be expressed as finite sum. We will see an example below. Then the 
required identity is given as follows. It is clear that for 8 > small enough 
\(t + 8)1 - Z\ + ((t + 8)1 - Z) = J] k a kY k : a k > is finite sum where Y k 
constitute an atomic basis. Let Pg = ^k- Then the required identity is given 
by Pq = lim^o P&- This is essentially a variant of equation [5] in Theorem [4j 

3.3 Examples 

In this section we consider some examples from standard probability theory 
It will be demonstrated that the algebraic approach not only gives a different 
perspective on some familiar situations it can also provide additional computa- 
tional tools. First, we review the correspondence between some concepts from 
the standard theory with our algebraic model. An event in probability theory 
is a measurable subset of the probability space. The random variable charac- 
terizing any (measurable) subset S is its indicator function Ig. In the algebraic 
language it is a projection Qg. The probability of the event corresponds to the 
expectation value w(Qg) of the projection. In the cases we consider the projec- 
tions will generally exist in the algebra itself. In some cases we consider infinite 
formal sums which are not in the algebra but any finite segment of the sum do 
belong to the algebra. In the actual computation we always use a "cut-off" to 
restrict to such a finite segment. In the cases where projections are not members 
of the algebra we can find a sequence (or net) that "converges in the mean" to 
the appropriate projection or indicator function. This situation generally arises 
in the continuous case which is only touched upon peripherally. 

1. Binomial distribution. Consider again infinite sequences of Bernoulli 
trials as in the second example of the previous section. We can think 
of coin-tossing with "heads" signaling success. Let Z be the observable 
(random variable) corresponding to the number of success. What is its 
d.f.? Let n, k be a positive integers with k < n. We want to find the 
distribution F(k : n) of Z. Recall that G is the 2-dimensional algebra and 
let A = <S) n G. Let {yo, yi} be the atomic basis of G with y\ corresponding 
to success. Set 

z = Y; yi ® yo ® • • • ® yo + 2 ® ® vo ■ • • ® yo h 1- 

5 5 




Here S denotes the distinct permutations of the factors in the tensor prod- 
uct. Thus, the rth term Y r is the sum of all (™) products with r y^s. Its 
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value is r. Note that Y r Y s = S rs . We have 

n 

U =\kl-Z\- (Jfel - Z) = 2 rY r 

r=k + l 

In this case the identity in the annihilator ideal of U exists and is given 
by the projection operator P = ^ r = Q Y r . Since the Bernoulli F(k : n) = 

^(P) = So (fc)^ fc (l —p) n h - Note that we can easily find the distribution 
in states where the observables are not independent. 

2. Waiting time. Let us start with a simple version of the problem of 
waiting time. Suppose we have a binary source with fixed probability dis- 
tribution emitting a bit per unit time. The waiting time is the time elapsed 
before the first appearance of 1. It is a random variable or observable W 
in our formalism. Using the notation above 

W = y ®2/i®l®- • - + 2?/o®2/o®2/i®l®- • • + 3y ®y ®?/o®J/i®l®- ■■ + ■■■ 

This is an unbounded infinite sum and does not belong to the algebra. 
However, for any t ^ 0, 

\tl- W \+tl-W 
*w(t) = g = 

tyi ® 1 + (t - i)y ® yi ® 1 + ••• + (< - yo ® • • • ® yo ®j/i ® 1 

v v ' 

[t\ factors 

is finite (of course, F\y(t) = for t < 0). Here [t\ is the largest integer 
^ t. Using the trick explained before the examples we replace t by t + S 
(this is to take into account the case when t is an integer). The required 
projection (approximate identity) is 

Pw(t) = yi ® i + yo ® yi ® i + • • • + yo ® • • • ® yo ®yi ® i 

v v ' 

[t\ factors 

The distribution function in a state is given by Fw(t) = fl(Pw(t))- If 
fl = oj ® u ® • • • is an infinite product state with u(yi) = p = 1 — u>(yo) 

thcnF w (t) = Y 1 tLoP( l -P) k - 

Next we generalize the problem of waiting time to arbitrary strings. Ex- 
plicitly, given a string £ of length n the waiting time is the time before a 
contiguous stream of bits matching £ appears. The preceding case is for 
£ = 1. We will only construct the observable corresponding to waiting 
time W in this general case. It gives a nice illustration of the algebraic 
techniques. Let X be the tensor representation of £. Waiting time cor- 
responds to the observable X (x) 1. We use the following notation. Write 
li for the identity in the 2-dimensional space G and lfc = li(x)li(x)---li, 
the A;- fold tensor product. The symbol 1 (without subscripts) will be re- 
served for the identity in ®°° A. The element Y = X (x) 1 corresponds to 
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waiting time 0: the first n symbols received match the given string. We 
expect the element corresponding to waiting time 1 will be "proportional" 
to Y[ = li (x) X (x) 1. Although Y and Y[ are projections they need not 
be orthogonal in the sense Y{Y = 0. So they do not correspond to mutu- 
ally exclusive events. Recall that when interpreted as functions on some 
measure space projections are indicators of measurable sets (events). We 
therefore adopt an orthogonalization scheme similar to Gram-Schmidt. 
The observable Y\ = Y{ — Y{Y n is projection and satisfies YiY = 0. 
Viewed as a function it takes value 1 only when the input string is of the 
C = &o£ • • • an d such that the prefix of length n of ( does not match £. It 
corresponds to waiting time 1. Defining inductively let 

Y m = y^ - y^(y + y 1 --- + Y m -i) 

= l m ®X ®1 - l m ®X ®1(Y + Y! - ■■ + Y m -!) 

It is easily verified that YjYk = SjkYk- The element W = XfcLn 
corresponds to the waiting time in this case. Again it is not an element 
of the algebra but \tl —W\+tl—W is. 

3. Markov Chains. We define a discreet time Markov chain on an observ- 
able algebra (A, uo) as a sequence of positive and unital maps {(f>a, <j>\, . . . , } 
and an initial element x e A. Let us confine to discrete chains. Let 
A = {xi,x 2 , ■ • ■ ,} be a fixed atomic basis. A chain-state is a sequence 
{zo, z\, . . . , } where each Zi e A. The usual term for what we call chain- 
state is simply "state" but the latter has a very specific meaning in oper- 
ator algebras. Let £„ = {z , z\, ■ ■ ■ ,z n } be a finite segment of the chain- 
state. We are interested in the transition from xq to x n via the path £„. 
The transition probability is defined recursively as follows. 

2/i = <M^o), Vk = <j>k-i{z k -iyk-i) and 
transition probability p(z z n ) = u(z n y n ) 

Let us examine this definition in the special case of stationary Markov 
chains. A Markov chain is defined to be stationary if all the transition 
maps are identical: <fio = <j>i = (f>2 = ■ ■ ■ ■ For a stationary chain 

p(z ^> z n ) = w(z„ ( / ) (z„_ 1 0(z„_20(- • • Z10(Z O ))))) 

= u{z )(j>{i n ,i n -i)4>{i n -i,i n -2) ■ ■ ■ <j>{h,io) 

Here 4>(i,j) is the (ij)th matrix element of 4> with respect to the basis A and 
Zk = x ik . This looks very similar to quantum transition probability. In the 
later case thexi are projections on a Hilbert space. Further, when we consider 
transitions over all possible paths then we get an analogue of Feynman's "sum 
over paths" for total transition probability. 
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3.4 Limit theorems 

The limit theorems of probability theory are important for its theoretical struc- 
ture as well as its empirical justification. We will be primarily concerned with 
the bounded case where the proofs are simpler. We state two of these but prove 
only the weak law of large numbers. From information theory perspective 
it is perhaps the most useful limit theorem. Let Xi, X2, ■ ■ ■ ,X n be indepen- 
dent, identically distributed (i.i.d) random variables on a probability space f2 
with probability measure P. Let fi be the mean of X\ (hence any Xi). We 
assume that the the variance E{X\ — /i) 2 is bounded. Here, E(X) denotes the 
expectation value of random variable X. 

• Weak law of large numbers. Given e > 

v n/lo Xi+---+Xn . . 

km P(\S n = n\ > e) = 

n— >oo n 

• Central limit theorem. If < E{Xf) = a < 00 then for any real x as 
n — > 00 

<x)^ ${x) =^=f exp (-(t - M ) 2 /2a)dx 



A few comments about these famous limit theorems. These are statements 
about different types of convergence |Bil95j . The theorems can be strengthened 
but since we are dealing with bounded random variables the above formula- 
tions suffice. These theorems require assigning of probabilities. All we have at 
our disposal is the algebra and one or more positive functionals (states) which 
give us expectation values. But we have already seen how to define probability 
distribution functions. What we need are appropriate projections or approxi- 
mations to them. Given a self-adjoint observable x and a real number a write 
x — a for the element x — at. Let A(x — a)+ be the (two-sided) ideal gen- 
erated by the positive part of x — a. Let e n = (x — a) + [(x — a)+ + 5n\ 
where < S n such that \min_yrj-j S n = 0. Then it can be shown that for any 
y e A(x — a) + , lim„^ lX , e n y —>■ y in the norm. Hence, {e„} is an increasing 
sequence approximating identity (see [Tak02 ). We write V(x > a) for this ap- 
proximate identity in A(x — a) + . It is not unique but that does not matter 
since all the limits that we use it to define are independent of the particu- 
lar choice. The probability corresponding to the "event" x > a is defined 
to be P(x > a) = uj(¥(x > a)) = lim„^ lXl w(e„). Similarly we can define 
P(x < a) = lu(¥(x < a)) where P(x < a) = {f n } is an approximate identity in 
the ideal A(x — o)_ obtained by replacing (x — a)+ by (x — a)_ in e n . We can 
define more complicated events by algebraic operations but it is not necessary 
for what follows. We also note that although we use probabilistic language in 
the statements of the results below all the expressions are actually defined in a 
strictly algebraic setting without reference to any underlying probability space. 
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Lemma 1 (Chebysev inequality). Let x,y e A be self-adjoint where (A,u>) is 
an observable algebra and y ^ 0. For any number e > we have 



P(y > e) sc 



and 



6 



P(\x-uj(x)\ > e) sc 



u/([x — u(x)] 2 ) 

7 2 



Proof. Let {e n } be an approximate identity in the ideal A(y — e) + . By definition 
e„ «c 1. Hence, to(y) = u(ye n ) + w(y(l - e„)) ^ ui(ye n ). Since (y - e)_ 
annihilates the ideal A(y — e)+, uj(ye n ) = tu([y — e]e„) + uj(ee n ) = uj([y — 
e] + e n ) + euj(e n ) ^ eu)(e n ). Hence, u>(y) ^ eu(e n ). Taking limits we obtain the 
first inequality. Observe that for any x e A, P(\x\ > e) = P(|a;| 2 > e 2 ) for the 
ideals A(\x\ — e)+ and A(|x| 2 — e 2 ) + coincide. This follows from the identities 
\x\ 2 - e 2 = (\x\ + e)(\x\ - e) and hence (\x\ 2 - e 2 )+ = (\x\ + e)(\x\ - e)+ plus the 
fact that | a; | + e is invertible. Hence the second inequality follows from the first 
by putting y = (x — u>(x)) 2 and using e 2 in place of e. □ 

We will prove next a convergence result which implies the weak law of large 
numbers. 

Theorem 5 (Law of large numbers (weak)). If x\, . . . , x n , . . . are co- 
independent self-adjoint elements in an observable algebra and ui(x^) = u>(Xj) 
for all positive integers i,j and k (they are identically distributed) then 



Proof. We may assume [i = (by reasoning with Xi — to(xi) instead of Xi). 
First we prove the statement for k = 2. Then uj(\ Xl+ " n +Xrl \) 2 = 2^ w(x 2 )Ai 2 = 
co(x 2 )/n. The first equality follows from independence (u>(xiXj) = u)(xi)w(xj) = 
for i ¥= j) the second from the fact that they are identically distributed. The 

case k = 2 is now trivial. Now let k = 2m. Then |xiH \-x n \ k = [x\-\ \-x n ) k . 

Put s n = (xi + h x n )/n. Expanding s k in a multinomial series we note that 

independence and the fact that to(xi) = implies that all terms in which at least 
one of the Xi has power 1 do not contribute to to(s*). The total number of the 
remaining terms is 0(n m ). Since the denominator is n 2m we see that w(s„) — > 0. 
Since for any x e A, \x\ = (x 2 ) 1 / 2 can be approximated by polynomials in x 2 we 
conclude that w(|s„|) — >■ 0. Finally, using the Cauchy-Schwartz type inequality 
w(|s n | 2r+1 ) ^ cj(s 2 )a;(s 2r ) we see that the theorem is true for all k. □ 

Corollary 5.1. Let x\, ...,x n and fi be as in the Theorem and set s n = (x\ + 

■ ■ ■ + x n )/n. Then for any e > there exist uq such that for all n > uq 



P(\s n - fi\> e) <e 

Proof. Using Chebysev inequality we have P(\s n — fi\ > e) = P(\s n — u>(s n )\ > 
^ ^ u{\s n -n\ ) ^ (jQ Sn — ^| 2 ) o (Theorem pi there is n such that Lo(\s n — 



lim lo(\ 



Xi H 



+ x, 



n 



\j\ ) = where /i = uj(xi) and k > 



n 




: 3 for n > uq. 



□ 
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4 Communication and Information 



We now come to our original theme: an algebraic framework for communication 
and information processes. We can view information as a measure of our state 
of ignorance or uncertainty. Mathematically, it is equivalent to some measure 
associated with a probability distribution of some physical quantity which we 
identify with an observable. Thus any manipulation of the quantity, for example, 
transmitting it or measuring it is given by some operation on the observable. 
Since our primary goal is the modeling of information processes we refer to the 
simple model of communication in the Introduction and model different aspects 
of it. 

4.1 Source and coding 

Definition 3. A source is a pair 5? = (X, S) where X a A, A a C* algebra 
and S is a set of states. A source is static if S consists of single state. It is 
discrete if X is countable. 

This definition abstracts the essential properties of a source. A real source 
could be an animate (human speech, for example) or inanimate object (a radio 
transmitter, for example). Its output can be considered discrete, for example, 
a keyboard with a fixed alphabet or continuous like radiation from a star. In 
this work we will be mainly concerned with discrete sources. Then X will be 
called the source alphabet. We assume that at each instant there is a probability 
distribution on the letters of the alphabet characterizing the state of the source 
at that instant. Thus a discrete source is a countable set of random variables. 
In the algebraic view it is a sequence of elements X of a C* algebra. The set of 
states S, called the states of the source, provide the probability distributions. 
If this distribution does not change (equivalently S consists of a single element) 
then we have a static source. We will mostly deal with static sources in this 
work. When we model transmission of information as a Markov process the 
state of the source is identified with the initial probability distribution. There 
is dual view. Suppose that a source emits letters from a finite alphabet. Then 
the set X in the above definition is a subset of the atomic basis (corresponding 
to the alphabet) of the algebra A. For a state u define 



We say that is the output of the source in state u>. Intuitively, O u is a kind 
of mean "point" in the space of outputs (compare it with the notion of center 
of mass in mechanics). More importantly, it facilitates calculation of important 
quantities and has close analogy with the quantum case. The quantum analogue 
may be pictured as follows. The source outputs "particles" in definite "states" 
Xi with probability pi = u>(xi). Note that here state corresponds to a projection 
operator. A measurement for Xi means applying the dual operator (oji(xj) = 
Sij) giving u>i(O 0J )=p i . 



n 




x n } an atomic basis 
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Let 3? = (X, ui) be a static discrete source. Suppose every x e X belongs 
to a finite-dimensional subalgebra generated by a (finite) set of ^-independent 
elements. Then using the Theorem we may assume that A = (X) B where B 
is finite-dimensional abelian C* algebra and ui is an (infinite) product state. In 
this case, each element of X is a tensor product of elements of an atomic basis 
of B. In the rest of the paper we assume that X is the product basis of atomic 
elements. For example, if B is the two dimensional algebra with atomic basis 
{yo, yi} then X is the set of elements of the form z± ® z 2 ® ■ ■ ■ ® Zk ® 1 ® 1 ® • • • 
where e {y ,yi}. 

4.2 Source coding 

Let B be a finite-dimensional C* algebra and A = (X) x B . We consider ® n B 
as a subalgebra of A via the standard embedding (all "factors" beyond the 
nth place equal 1). Let X n be its atomic basis in some fixed ordering and let 
X = [j n X n . We can consider B as the source alphabet and X n as strings of 
length n. Let B' be another finite-dimensional C* algebra and A' = (x) J B'. A 
source coding is a linear map / : B — > T =<z 2fc>i ® h B'. Here T is the linear 
subspace. It induces a (linear) map 

® n f : ® n B -> A' given by ® n f{x x ®---®x n ) = /(x x ) ® ■ ■ ■ ® /(x„) 

®"/ extends to a unique map F : A — > A' . Note that we first induce a map on 
® n B, n = 1,2,..., and £/ien lift it to A. We allow the map / to take values 
that are not simple products. However, for classical communication we require 
that each atomic basis element Xi 6 B be mapped to a tensor product of atomic 
basis elements. Since we are dealing with classical information in this paper it 
will be implicitly assumed that all the codes are classical. Let us consider an 
example to clarify these points. 

Example. Let {xq, X\, x^} be an atomic basis for B. Let B' = G with 
atomic basis {y ,yi}. Define fi by fi(x ) = yo,fl(%l) = Vi, h(x 2 ) = yo ® 
yi and fi(x3) = y\ ®yo- Denote by fx its extension to tensor products. Since 
fi{xo ®x\) = yo ® 2/i = fi(x2), fi is not injective. Hence it cannot be inverted 
on its range. Consider next the map / 2 (x ) = 2/0)/a(^i) = Vo ®Vt, h{ x 2) = 
Ho ®Ui ® Vi aud f2(%3) = 2/i ® 2/i ® Vi- This map is invertible but one has to 
look at the complete product before finding the inverse. It is not prefix-free. 

Now going back to the general formulation a code / : B — > T is defined to be 
prefix-free if for distinct members x\, x 2 in an atomic basis of B, f'(xi)f'(x2) = 
where /' is the map /' : B —> (X) ~ B' induced by /. That is, distinct elements 
of the atomic basis of B are mapped to orthogonal elements. Recall that two 
elements x, y of an algebra are considered orthogonal if their product xy = 0. |^] 
Now, in the standard formulation an alphabet is a finite set and a code is a map 

6 The use of the term "orthogonal" may be questionable since there is no scalar product. 
But let us observe that the projection operators corresponding to two pure states in quantum 
mechanics have algebraic product if and only if they are orthogonal. 
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from Y — »■ Z + where Y, Z are alphabets and Z + is the set of non-empty finite 
strings from Z. The definition of prefix-free in this case is clear. In the algebraic 
language the free monoidal structure defined by concatenation is replaced by 
the tensor structure. Then the "code-word" Z\ ® Z\ ® ■ ■ ■ ® Zk ® 1 ® 1 ® • • • 
is not orthogonal to another z[ ® z[ ® • • • ® z' m ® 1 ® 1 ® • • • with k ^ m 
if and only if z\ = z[,...,Zk = z' k . We observe that one has to be careful 
about correspondence between the two approaches. For example, one might be 
tempted to identify the identity 1 with the empty string but the 1 is the sum of 
the members of an atomic basis! The binary operation "+" has a relatively lesser 
role in the classical formalism but it is crucial in the quantum framework (via 
superposition principle). Our first result is a useful and well-known inequality 
proved using algebraic techniques. 

Lemma 2 (Kraft inequality). Let B be an n- dimensional abelian C* algebra. 
Corresponding to a finite sequence k\ ^ fc 2 ^ ' ' " ^ Kn of positive integers let 
a±, . . . , a m be a set of prefix-free elements in ^ li>1 ® l B such that ai e ® ki B. 
Further, suppose that each on is a tensor product of elements from a fixed atomic 
basis of B. Then 



Proof. Let b = {y l7 . . . , y n } be the fixed atomic basis of B and set k m = M. We 
can then restrict our attention to the finite-dimensional algebra Z = ® l B. 
Let a i = z\ ® ■ ■ ■ ® z\ ® 1 ® ■ ■ ■ ® 1 where z\ e b. Let f3 = z\ ® ■ ■ ■ ® z\ and 



Then Z 1 a ® M B is a subalgebra (without unit) of dimension n M kl . The 
assumption that a; are prefix- free implies ct2, as, . . . , cxm must be in Z[ the 
"orthogonal" complement to Z\ in Z. Dimension of Z' x = n M — n M ~ kl . Re- 
peating this argument with «2, • ■ • , c ( k m _ 1 we conclude that au m must be in a 
subspace of dimension n M — n M ~ kl — n M ~ k2 — ■ ■ ■ — n M - fc ™-i. Since ak m is 
non-zero n M — n M ~ kl — n M ~ k ' 2 — ■ ■ ■ — n M - k ™-i ^ l. This is equivalent to the 
relation (pj. □ 

With the notation of the lemma we call the sequence W = {ai, . . . ,a m } 
decipherable if the tensor product of any two distinct finite ordered sequence 
of elements from W are distinct. The sequences may have repeated elements. 
The Kraft inequality is valid for decipherable sequences M ac53j . However, 
the proof is essentially combinatorial. The Kraft inequality also provides a 
sufficiency condition for prefix- free code |Ash90l ICT99| . Thus the existence of 
a decipherable code of word-lengths (fci, ki, . . . , k m ) implies the existence of a 
prefix-free code of same word-lengths. In the following, we restrict ourselves to 
prefix-free codes. If g : A — > ® rj B is a prefix-free code then it maps orthogonal 
elements to orthogonal elements. It is therefore an algebra isomorphism (a 
one-to-one homomorphism) . Next we have a technical lemma that is useful in 
finding bounds. 



m 




(6) 



Zi = {(3 ® 7 : 7 e ® 



M-fci 
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Lemma 3. Let f be a continuous real function on (0,oo) such that xf(x) is 
convex and lim^^o x f(x) = 0. Let A be a finite- dimensional C* algebra with 
atomic basis {xi, . . . ,x n } and uj a state on A. Then for any set of numbers 
{ai : i = 1, . . . ,n; > and J\ a % ^ 1} we have 

i 1 

Proof. Let oj(xi) = pi. We have to show that ^Pif(pi/ai) 5= /(I)- First assume 
that all pi > and J], °i = 1- Then 

2ft/(PiM) = 2 «*-/(-) > = /(i) 

by convexity of xf(x). The general case can be proved by starting with ai 
corresponding to pi > and adding extra a/s to satisfy 2i a * = 1 if necessary. 
The corresponding pj is set to 0. Now define a new function g(x) = xf(x), x > 
and g(0) = 0. The conclusion of the lemma follows by arguing as above with 
g. ' □ 

Using the lemma for the function f(x) = log x and Lemma[2]we easily deduce 
the following. 

Proposition 3 (Noiseless coding). Let be a source with output e A, a 
finite- dimensional C* algebra with atomic basis {xi, . . . ,x n } (the alphabet). Let 
g be prefix-free code such that g(xi) is a tensor product of ki members of the 
code basis. Then 

i 

Next we give a simple application of Theorem [5| First define a positive 
functional Tr on a finite dimensional abelian C* algebra A with an atomic basis 
{x\, . . . , Xd} by Tr = u>i + ■ ■ ■ + ojd where oji are the dual functionals. It is clear 
that Tr is independent of the choice of atomic basis. Informally the function 
Tr gives the dimension of a projection. 

Theorem 6 (Asymptotic Equipartition Property (AEP)). Let 5? be a source 
with output Oaj = 2<=i u(xi)Xi where uj is a state on the finite dimensional 
algebra with atomic basis {xi}. Then given e > there is a positive integer no 
such that for all n > no 

where H = uj(log 2 (O u )) is the entropy of the source and the probability distribu- 
tion is calculated with respect to the state fl n = uj®- ■ -®lu (n factors) of® n A. Lf 
Q denotes the identity in the subalgebra generated by \ log2(® n w ) +nH\) + 
then 

(1 _ e )2"( ff H- e ) Tr(Q) sS 2 n(H{ ^ + ^ 

Before proving the theorem some explanations are necessary. First log 2 x (= 
In xj In 2) is usually defined for strictly positive elements of a C* algebrE^] We 

7 Henceforth log will be always with respect to base 2 unless specified otherwise 
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extend the definition to all non-zero x ^ 0. The standard method of extending 
complex functions (continuous or analytic) functions to a C* algebra is via 
functional calculus (KR97J. However, in our case it is simpler. Let {?/.;} be a 
atomic basis in an abelian C* algebra. Let y = 2j a iVi with <ij ^ 0. Then 
define log 2 y = Ya^iVi where 6j = log cij if a.j > and otherwise. This 
definition implies that some standard properties of log are no longer true (e.g. 
2 los ' x ^ x). But in the present context it gives the correct result when we 
take expectation values as in the formulas in the theorem. A somewhat longer 
but mathematically better justified route is to "renormalize" the state. Thus 
if u)(xi) = for k indices we define uj'(xi) = 5 where 6 is arbitrarily small 
but positive and u)'(xj) = to{xj) — k5 where uj'(xj) > kS. If we can prove the 
theorem now for ui' and since the relations are valid in the limit 6 — »■ then we 
are done. We will not take this path but implicitly assume that the probabilities 
are positive. Finally, note that the element Q is a projection on the subalgebra 
generated by (el — | log 2 ((x) n w ) — nH\) + . It corresponds to the set of strings 
whose probabilities are between 2~ nH ~ e and 2~ nH+e . The integer Tr(Q) is 
simply the cardinality of this set. 

Proof of the theorem. First note that log ab = log a + log b for elements a, b ^ 
in A. We can write (x)™©^ = X X X 2 ■ • ■ X n where X., = l(x)l(x)- • -®O w ®l®- • -®1 
with log Ou; in the ith place. The fact that f2„ is a product state on ® n A (cor- 
responding to a source whose successive outputs are independent) implies that 
Xi are independent and identically distributed. We can now apply the corollary 
to Theorem |5] yielding P(\ log (® n O UJ ) - Sl n QogX 1 )\ > e) = P(\ log - 
w(log(O u ))|>e). □ 

4.3 Communication Channels 

Every form of communication requires channels through which signals are sent 
and received. It is perhaps the most important component in the mathemati- 
cal models of communication. We will not be dealing with real channels which 
are complex physical objects — the atmosphere, a telephone cable, a bus on 
the mainboard of a computer are some examples. Our object is to give simple 
mathematical models of a channel which still yield interesting results relevant 
for concrete channels. The original paper of Shannon characterized channels by 
a transition probability function. Thus, the channel (precisely a two-way chan- 
nel) has an input alphabet X and output alphabet Y and a sequence of random 
functions <\> n : X n — > Y n . The latter are characterized by probability distribu- 
tions p n (y( n ' \x( n '), the interpretation being: cj) n (x^ n ') = y( n > with conditional 
probability p n (y^\x^). Note that the distribution depends on the entire his- 
tory. We say that such a channel has (infinite) memory. A channel has finite 
memory if there is an integer k ^ such that if x^ = x n x n -\ ■ ■ ■ x n -k+i ■••X\ 
then p n {y (n) \x^) = Pn{y (n) W {n) ) for any string x' n of length n such that 
x' n = %m ■ ■ ■ ,x' n -k+i = x n—k+i- That is, the probability distribution depends 
on the most recent k symbols seen by the channel. A channel is memoryless 
if k = 1. Since we will be dealing mostly with discrete memoryless channels 
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(DMS) this property will be tacitly assumed unless stated otherwise. In the 
memoryless case it is easy to show the simple form of transition probabilities 

Pn(y (n) |z (n) ) = Pn(yi ■ --Vnlxi ...x n )= p(y 1 \x 1 )p(y 2 \x 2 ) ■ ■ "P(y n \x n ) (7) 

This motivates us to define the channel transformation matrix C(yj\xi) with 
y~ e Y and Xi e X. As before in this work X and Y will be finite sets. Since 
the matrix C(yj\xi) is supposed to represent the probability that the channel 
outputs yj on input x% we must have J] - C(yj\xi) = 1 for all i. In other words, 
matrix C(ij) = C(yj\xi) is row stochastic. This is the standard formulation. 
|Ash90[ ICT99[ IKhiSfj ^] We now turn to the algebraic formulation. We restrict 
ourselves to two-terminal channels here. 

Definition 4. A DMS channel C = {X, Y, C} where X and Y are abelian C* 
algebras of dimension m and n respectively and C : Y — > X is a unital positive 
map. The algebras X and Y will be called the input and output algebras of 
the channel respectively. Given a state lu on X we say that (X, lo) is the input 
source for the channel. 

We recall that a positive map C : Y — > X is a linear map such that C(y) 5= 
if y ^ 0. Sometimes we write the entries of C in the more suggestive form 
Cij = C(yj\xi) where {y-j} and {xi} are atomic bases for Y and X respectively. 
Thus C(yj) = ^iCijXi = 2, @(yj\ x i) x i- Note that in our notation C is an 
mxn matrix. Its transpose Cjl = C{yj\xi) is the channel matrix in the standard 
formulation. We have to deal with the transpose because the channel is a map 
from the output alphabet to the input alphabet. This may be counterintuitive 
but observe that any map Y — » X defines a unique dual map S(X) — » S(Y), 
on the respective state spaces. Informally, a channel transforms a probability 
distribution on the input alphabet to a distribution on the output. In other 
words, given an input source there is a unique output source determined by 
the channel. Let us note that in case of abelian algebras every positive map 
is guaranteed to be completely positive Tak02j- This is no longer true in the 
non-abclian case. Hence for the quantum case completely positivity has to be 
explicitly imposed on (quantum) channels. 

We characterize a channel by input/output algebras (of observables) and a 
positive map. Like the source output we now define a useful quantity called 
channel output. Corresponding to the atomic basis {yi} of Y let ® k yi(k) be an 
atomic basis in ® n Y . Here i(k) = («i*2 • ■ - ik) l& a multi-index. Similarly we 
have an atomic basis {® aJj(k)} for ® k X. The level-fe channel output is defined 



m 

Here C^ k ' represents the channel transition probability matrix on the fc-fold 
tensor product corresponding to strings of length k. In the DMS case it is 

8 In this work we will not deal with channel coding and decoding. Including these concepts 
is not difficult but complicates the notation. 



to be. 




(8) 
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simply the /c-fold tensor product of the matrix C. The channel output defined 
here encodes most important features of the communication process. First, 
given the input source function Q I u k = ^ i Lo k (xi(k))xi(k) the output source 
function is defined by 

Qk = I®7r &x ((l ®I uh )O k c ) =Y i Y 1 C (ym\ x J(k)) ujk ( x J(k))y t (k) (9) 

» 3 

Here, the state Co k on the output space ® k Y can be obtained via the dual 
& k (y) = C k (oj k )(y) = u> k (C k (y)). The formula above is an alternative repre- 
sentation which is very similar to the quantum case. The joint output of the 
channel can be considered as the combined output of the two terminals of the 
channel. This is obtained by not tracing out over the input in the equation [9] 
Thus the joint output 

J^k = (1 ®1^)0% = Y^^ k {yt(k) ®x 3 (k))Vi(k) ®x 3 (k) with 

ij (10) 
n k (y %{k ) ®Xj(k)) = C(y l(k) \x j(k) )u(x j(k) ) 

Let us analyze the algebraic definition of channel given above. For simplicity of 
notation, we restrict ourselves to level 1. The explicit representation of channel 
output is 

i 3 

We interpreted this as follows: if on the channel out-terminal j/j is observed 
then the input could be Xj with probability C{yi\xj)oj(xj)/'^ l C(yi\xj)to(xj). 
Now suppose that for a fixed i C(jji\Xj) = for all j except one say, j,-. Then 
on observing j/j at the output we are certain that the the input is Xj. . If this is 
true for all values of y then we have an instance of a lossless channel. It is easy 
to write the channel matrix in this case. Thus, given 1 ^ j ^ n let dj be the 
set of integers i for which C(yi\x 3 ) > 0. The lossless property implies that {dj} 
form a partition of the set {1, . . . , to}. The corresponding channel output is 

° c = C (Vi\ x i)Vi) ® X 3 

3 iedj 

Clearly lossless channels are the most useful for communication of information. 
At the other extreme is the useless channel in which there is no correlation 
between the input and the output. To define it formally, consider a channel 
C = {X, Y, C} as above. The map C induces a map C : Y® X — > X defined by 
C'{y®x) = xC(y). Given a state ui on X the dual of the map C defines a state 
fie onY®X: fl c (y®x) = w{C'{y®x)) = C(y\x)uu(x). We call tt c the joint 
(input-output) state of the channel. A channel is useless if Y and X (identified 
as Y ® 1 and 1 ® X resp.) are f^c-independent. 

9 We called this the source output before. But as the channel has two terminals we call it 
input source function to avoid confusion. 
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Lemma 4. A channel C = {X, Y, C} with input source (X, uo) is useless iff the 
matrix C'ij = C(yj\xi) is of rank 1. 

Proof. Suppose C is useless. Note that 0^(1 ®x) = u(x) and £l c (y®V) = uu(y) 
where £j(y) = uu(C(y)) is the image of u) under the dual of the map C. Then 
Clc independence implies C{yj\xi)uj{xi) = Lo(xi)6j(yj). We may assume that 
all tu(xi) > (otherwise we just discard it). Hence, C(yj\xi) = u>(yj) and this 
proves necessity. Now if dj has rank 1 then all the rows are non-zero multiples 
of any one row, say the first. Since C is a row stochastic matrix the rows must 
be identical, that is, CV, = aj = &(yj) and independence is trivially verified. □ 

The definition of a useless channel captures the intuition that if there is 
no correlation between the input and output then we can recover practically 
nothing. The channel coding theorem asserts that apart from this extreme case 
we can decode the output to recover a large portion of the input with high 
probability of success. The algebraic version of the channel coding theorem 
assures that it is possible to approximate, in the long run, an arbitrary channel 
(excepting the useless case) by a lossless one. 

Theorem 7 (Channel coding). Let C be a channel with input algebra X and 
output algebra Y . Let {xi}"^i and ^ e atomic bases for X and Y resp. 

Given a state u> on X , if the channel is not useless then for each k there are 
subalgebras Y k a ® k Y,X k c: ® k X, a map Ck ■ Y k — » X k induced by C and a 
lossless channel Lk : Y k ^> Xk such that 

lim n(\0 Ck - Lk |) = on T k = Y k ® X k 

k— >oo 

Here = (x) 00 ^ and on ® k Y®® k Y it acts as fl k = ® k Qc where Qc is the state 
induced by the channel and a given input state u>. Moreover, if r k = dim(X k ) 
then R = l ° s k Tk > called transmission rate, is independent of k. 

First let us clarify the meaning of the above statements. The theorem simply 
states that on the chosen set of codewords the channel output of C k induced by 
the given channel can be made arbitrarily close to that of a lossless channel L k . 
Since a lossless channel has a definite decision scheme for decoding the choice 
of Lfe is effectively a decision scheme for decoding the original channel's output 
when the input is restricted to our "code-book" . This in turn implies that the 
probability of error tends to 0. 

Proof. From an atomic basis of ® k X choose a subset A k of cardinality r k (to 
be determined). Let X k be the subalgebra generated by A k - Write for 
the fe-fold tensor product of C. Let Q k be the identity on X k (it is the sum of 
all the members of A k )- For an atomic basis B k of ® k Y let B' k be the subset 
such that C^(y)Q k ¥= for y e B' k . Let Y k be the subalgebra generated by 
B' k and C k '■ Y k — » X k denote the linear map C k (y) = Q k C^ k \y). Informally, 
if we restrict the messages to observables in A k then the output algebra is Y k . 
The new channel map is C k . We now have a new channel C k = (Xk,Yk,Ck)- 
Throughout the rest of the proof we will assume that we are working in Tk 
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with the appropriate maps. We next define L k as follows. For yji e B' k let 
Ck{yi) = Tij Ck(yi\xj)xj, Xj e A k . Let C k (y l \x lr ) be the maximum of C k (yi\xj) 
for fixed (if there are more than one index equal to this maximum choose 
one arbitrarily). Let L k {yj) = Lu(yAx ir . The map L k is not unital. Strictly 
speaking L k is not a channel map as we have defined above. However, as we see 
below, L k does approximate Oc k in T k with small error. What this means is 
that with high probability we can correctly associate a unique and correct input 
to a given channel output F^l The non-unital property of L k is reflective of the 
situation in which some of the original messages outside of X k may end up in 
r fc . Set r k = 2 kR and let 

Q u = and = £ u(x){l®x) 

yeB' k xeA k 

Here O^k and O u k are respectively the input and output source function for 
the channel C k . Let Z k be the identity on the ideal generated by (log Oc k — 
logO Qk -k(R + e))+ = (log (0 Ch O. I) - k(R+ e)) + , e>0inT fc .Q Note that 
Ck =O nk Oj onT fc . Since 

Z k \0 Ck Ozt-2 k ^\ = {0 Ck Ozl-2 k ^) + = Z k {O nk O-JOzl-2 k ^) > 

and Zl = Z k we conclude that Z k 0^k sC Z k O nk 0^2~ k( - R+ ^ sC Z k 2^ R+e \ 
The last inequality follows from the fact that O^kO^l ^ 1. We also have 
Oij. ^ Oc k and £l k (Z k ) = 1r(Z k 0^k). The last fact is true for any projection 
as can be verified using an atomic basis. We now have 

n(z k \o Ck - o Lk \) = n k (z k (o Ck - o Lk )) ^ n k (z k ) = Tr{z k o nk ) 

H 1x(Z k O u ) sC 2- k ( R+t hr{Z k ) s= 2-^ R+ ^r k = 2- ke 

Hence fl(Z k \0 Ck - Lk \) = fl k (Z k (0 Ck - L J) as k -+ oo. To complete 
the proof we look at the complementary part: (l k — Z k )\Oc k — C>lJ where l k 
is the identity in T k . Consider the projection t k — Z k . Z k is the identity in 
the annihilating ideal of F k _ where F k = (log Oc k — logO^k — k(R + e)). Let 
G k = (log (® fc OcO ( ^ 1 ) - k(R+ e)l). Then since F k is the restriction of G k to a 
subspace G k = F k + F k there is an F' k e ® k Y®® k X with F k F' k = (we use the 
fact the channel is memoryless). Hence taking an approximating polynomial 
sequence G k - = F k - + F k _. It follows that F k - ^ G k - and Z' k , the identity 
on the annihilating ideal of G k - satisfies ZL ^ Z k . This implies £l(l k — Z k ) ^ 
n(l - Z' k ). By definition - Z k ) = P (\og® k O cO^, 1 /k - (-R + e) < 0) is the 
probability that G k < R + e. But 

n(|(log® fe O c Cy/fc-n(logO c - IogO a )l|) - as k -» oo 

10 We have combined two types of decoding scheme: the ideal observer decoding Ash90 
and typical set decoding ICT99I 

11 This ideal is (log (Oc k 0~ k ) — k(R + e)) + . Note that we write the scalar k(R + e) instead 
of the more accurate k(R + e)t k where 1^ is the unit in T k . 
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follows from the law of large numbers (see Theorem [5] and its corollary) . The 
quantity I(X,Y) = ft(logO c - logO*) = H(Y) - H(Y\X) is defined as the 
mutual information between the input and output algebras and H(Y\X) is the 
conditional entropy. Thus if we have R < I(X, Y), say R ^ I(X, Y) — 2e then 
n(l-Z' k ) = P(\og® k O c Ozl/k-(R + e) < 0) P{\ log (® k O c O^)/k- 1| > e) 
but the latter — » 0. Putting it all together we have for any e > and R < I — 2e 

- z k )\o Ck - o Lk \) *s n(i fc - z k ) n(i - z' k ) 

= P{\\og{® k O c 0^ 1 )/k-l\ >e)^0ask^ao 
As we already have £l(Zk\Oc k — OlJ) — > the proof is complete. □ 

The channel coding theorem implies that it is possible to choose a set of 
"codewords" which can be transmitted with high reliability. It is easy to see that 
for a lossless channel the input entropy H(X) is equal to the mutual information. 
We may think of this as conservation of entropy or information which justifies 
the term "lossless" . Since it is always the case that H(X) — H(X \Y) = I(X, Y) 
the quantity H(X\Y) can be considered the loss due to the channel. The channel 
coding theorem is perhaps the most celebrated theorem in Shannon's work al- 
though his proof was not rigorous. The algebraic version of the theorem serves 
two primary purposes. First, we attempt to make the proof as "algebraic" 
as possible. More importantly, it gives us the commutative perspective from 
which we will seek possible extensions to the non-commutative case. Secondly, 
the channel map L can be used for a decoding scheme. Thus we may think of a 
coding-decoding scheme for a given channel as a sequence of pairs (Xk,Lk) as 
above. 

The coding theorems can be extended to more complicated scenarios like 
ergodic sources and channels with finite memory. The converse of the channel 
coding theorem — roughly, any such coding scheme with error tending to (con- 
vergence in probability) must have the rate logrfc/fc ^ I — is also true. We will 
not pursue these issues further here. But we are confident that these generaliza- 
tions can be appropriately formulated and proved in the algebraic framework. 

5 Conclusion and preview of the future work 

In the preceding sections we have laid the basic algebraic framework for infor- 
mation theory. This work was devoted to classical parts of information theory 
corresponding to abelian algebras. Since information theory relies heavily on 
probabilistic concepts we devoted a major part of the paper to algebraic prob- 
ability theory. Although, we often confined our discussion to finite-dimensional 
algebras corresponding to finite sample spaces it is possible to extend it to 
infinite-dimensional algebras of continuous sample spaces. In this regard, a nat- 
ural question is: can the algebraic formulation replace Kolmogorov axiomatics 
based on measure theory? Naively, the answer is no because the assumption 
of a norm-compete algebra imposes the restriction that the random variables 
that they represent must be bounded. Moreover, the GNS construction implies 
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that the algebraic framework is essentially equivalent to (almost) bounded ran- 
dom variables on a locally compact space. In order to deal with the unbounded 
case we have to go beyond the normed algebra structures. A possible course of 
action is indicated in the examples given in section |3.3| via the use of a "cut- 
off" . A more general approach would be to consider sequences which converge 
in a topology weaker than the norm topology to elements of a larger algebra. 
These and other related issues on foundations are deep and merit a separate 
investigation. 

The second major theme of this paper is information theory in the algebraic 
framework. As some the most important results of information theory concern 
finite or discrete alphabet we have primarily dealt with these cases only. In 
this context, we can treat ergodic sources, channels with finite memory and 
multi-terminal channels. These topics will be investigated in the future in the 
non-commutative setting. However, let us recall one of the principal motivation 
of this paper: the construction of a single framework for dealing with quan- 
tum and classical information. We have seen that the algebraic theory in the 
commutative case already indicates the close analogies between the two cases. 
We will delve deeper into these analogies and aim to throw light on some basic 
issues like quantum Huffman coding [BFGLOO , channel capacities and general 
no-go theorems among others, once we formulate the appropriate models. In 
this context, let us mention that many investigators have recognized the impor- 
tance of the algebraic framework but a comprehensive algebraic model which 
can be extended to infinite-dimensional case is lacking. We aim to address these 
important issues in subsequent work. 
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